Goto

Collaborating Authors

 optimal sketching


Optimal Sketching for Trace Estimation

Neural Information Processing Systems

Matrix trace estimation is ubiquitous in machine learning applications and has traditionally relied on Hutchinson's method, which requires $O(\log(1/\delta)/\epsilon^2)$ matrix-vector product queries to achieve a $(1 \pm \epsilon)$-multiplicative approximation to $\text{trace}(A)$ with failure probability $\delta$ on positive-semidefinite input matrices $A$. Recently, the Hutch++ algorithm was proposed, which reduces the number of matrix-vector queries from $O(1/\epsilon^2)$ to the optimal $O(1/\epsilon)$, and the algorithm succeeds with constant probability. However, in the high probability setting, the non-adaptive Hutch++ algorithm suffers an extra $O(\sqrt{\log(1/\delta)})$ multiplicative factor in its query complexity. Non-adaptive methods are important, as they correspond to sketching algorithms, which are mergeable, highly parallelizable, and provide low-memory streaming algorithms as well as low-communication distributed protocols. In this work, we close the gap between non-adaptive and adaptive algorithms, showing that even non-adaptive algorithms can achieve $O(\sqrt{\log(1/\delta)}/\epsilon + \log(1/\delta))$ matrix-vector products. In addition, we prove matching lower bounds demonstrating that, up to a $\log \log(1/\delta)$ factor, no further improvement in the dependence on $\delta$ or $\epsilon$ is possible by any non-adaptive algorithm. Finally, our experiments demonstrate the superior performance of our sketch over the adaptive Hutch++ algorithm, which is less parallelizable, as well as over the non-adaptive Hutchinson's method.


Optimal Sketching for Trace Estimation

Neural Information Processing Systems

Matrix trace estimation is ubiquitous in machine learning applications and has traditionally relied on Hutchinson's method, which requires O(\log(1/\delta)/\epsilon 2) matrix-vector product queries to achieve a (1 \pm \epsilon) -multiplicative approximation to \text{trace}(A) with failure probability \delta on positive-semidefinite input matrices A . Recently, the Hutch algorithm was proposed, which reduces the number of matrix-vector queries from O(1/\epsilon 2) to the optimal O(1/\epsilon), and the algorithm succeeds with constant probability. However, in the high probability setting, the non-adaptive Hutch algorithm suffers an extra O(\sqrt{\log(1/\delta)}) multiplicative factor in its query complexity. Non-adaptive methods are important, as they correspond to sketching algorithms, which are mergeable, highly parallelizable, and provide low-memory streaming algorithms as well as low-communication distributed protocols. In this work, we close the gap between non-adaptive and adaptive algorithms, showing that even non-adaptive algorithms can achieve O(\sqrt{\log(1/\delta)}/\epsilon \log(1/\delta)) matrix-vector products. In addition, we prove matching lower bounds demonstrating that, up to a \log \log(1/\delta) factor, no further improvement in the dependence on \delta or \epsilon is possible by any non-adaptive algorithm.


Near Optimal Sketching of Low-Rank Tensor Regression

Li, Xingguo, Haupt, Jarvis, Woodruff, David

Neural Information Processing Systems

This problem is motivated by the fact that the number of parameters in $\Theta$ is only $R \cdot \sum_{d 1} D p_D$, which is significantly smaller than the $\prod_{d 1} {D} p_d$ number of parameters in ordinary least squares regression. We consider the above CP decomposition model of tensors $\Theta$, as well as the Tucker decomposition. We obtain a significantly smaller dimension and sparsity in the randomized linear mapping $\Phi$ than is possible for ordinary least squares regression. Finally, we give a number of numerical simulations supporting our theory. Papers published at the Neural Information Processing Systems Conference.